3,041 research outputs found
Aspirations as reference points: an experimental investigation of risk behavior over time
This paper examines the importance of aspirations as reference points in a multi-period decision-making context. After stating their personal aspiration level, 172 individuals made six sequential decisions among risky prospects as part of a choice experiment. The results show that individuals make different risky-choices in a multi-period compared to a single-period setting. In particular, individuals’ aspiration level is their main reference point during the early stages of decision-making, while their starting status (wealth level at the start of the experiment) becomes the central reference point during the later stages of their multi-period decision-making.Arvid O. I. Hoffmann; Sam F. Henry; Nikos Kalogera
Partitioning Strategies for Concurrent Programming
This work presents four partitioning strategies, or patterns, useful for decomposing a serial application into multiple concurrently executing parts. These partitioning strategies augment the commonly used task and data parallel design patterns by recognizing that applications are spatiotemporal in nature. Therefore, data and instruction decomposition are further distinguished by whether the partitioning is done in the spatial or in temporal dimension. Thus, this work describes four decomposition strategies: spatial data partitioning (SDP), temporal data partitioning (TDP), spatial instruction partitioning (SIP), and temporal instruction partitioning (TIP), while cataloging the benefits and drawbacks of each. In addition, the practical use of these strategies is demonstrated through a case study in which they are applied to implement several different parallelizations of a multicore H.264 encoder for HD video. This case study illustrates both the application of the patterns and their effects on the performance of the encoder
Approximation Algorithms for Scheduling with Resource and Precedence Constraints
We study non-preemptive scheduling problems on identical parallel machines and uniformly related machines under both resource constraints and general precedence constraints between jobs. Our first result is an O(logn)-approximation algorithm for the objective of minimizing the makespan on parallel identical machines under resource and general precedence constraints. We then use this result as a subroutine to obtain an O(logn)-approximation algorithm for the
more general objective of minimizing the total weighted completion time on parallel identical machines under both constraints. Finally, we present an O(logm logn)-approximation algorithm for scheduling under these constraints on uniformly related machines. We show that these results can all be generalized to include the case where each job has a release time. This is the first upper bound on the approximability of this class of scheduling problems where both resource and general precedence constraints must be satisfied simultaneously
Remote Store Programming: Mechanisms and Performance
This paper presents remote store programming (RSP). This paradigm combines usability and efficiency through the exploitation of a simple hardware mechanism, the remote store, which can easily be added to existing multicores.Remote store programs are marked by fine-grained and one-sided communication which results in a stream of data flowing from the registers of a sending process to the cache of a destination process. The RSP model and its hardware implementation trade a relatively high store latency for a low load latency because loads are more common than stores, and it is easier to tolerate store latency than load latency. This paper demonstrates the performance advantages of remote store programming by comparing it to both cache-coherent shared memory and direct memory access (DMA) based approaches using the TILEPro64 processor. The paper studies two applications: a two-dimensional Fast Fourier Transform (2D FFT) and an H.264 encoder for high-definition video. For a 2D FFT using 56 cores, RSP is 1.64x faster than DMA and 4.4x faster than shared memory. For an H.264 encoder using 40 cores, RSP achieves the same performance as DMA and 4.8x the performance of shared memory. Along with these performance advantages, RSP requires the least hardware support of the three. RSP's features, performance, and hardware simplicity make it well suited to the embedded processing domain
Quantifying Wraparound Health Insurance Needs among Employed People with Disabilities
A presentation about insurance coverage for health care services and supports for people with disabilities who work. “Wrap-around” coverage (or other policy) options may be a viable solution and support employment among people with disabilities.
Presentation for the 2015 Academy Health Disability Research Interest Group
Managing performance vs. accuracy trade-offs with loop perforation
Many modern computations (such as video and audio encoders, Monte Carlo simulations, and machine learning algorithms) are designed to trade off accuracy in return for increased performance. To date, such computations typically use ad-hoc, domain-specific techniques developed specifically for the computation at hand. Loop perforation provides a general technique to trade accuracy for performance by transforming loops to execute a subset of their iterations. A criticality testing phase filters out critical loops (whose perforation produces unacceptable behavior) to identify tunable loops (whose perforation produces more efficient and still acceptably accurate computations). A perforation space exploration algorithm perforates combinations of tunable loops to find Pareto-optimal perforation policies. Our results indicate that, for a range of applications, this approach typically delivers performance increases of over a factor of two (and up to a factor of seven) while changing the result that the application produces by less than 10%
Quality of service profiling
Many computations exhibit a trade off between execution time and quality of service. A video encoder, for example, can often encode frames more quickly if it is given the freedom to produce slightly lower quality video. A developer attempting to optimize such computations must navigate a complex trade-off space to find optimizations that appropriately balance quality of service and performance.
We present a new quality of service profiler that is designed to help developers identify promising optimization opportunities in such computations. In contrast to standard profilers, which simply identify time-consuming parts of the computation, a quality of service profiler is designed to identify subcomputations that can be replaced with new (and potentially less accurate) subcomputations that deliver significantly increased performance in return for acceptably small quality of service losses.
Our quality of service profiler uses loop perforation (which transforms loops to perform fewer iterations than the original loop) to obtain implementations that occupy different points in the performance/quality of service trade-off space. The rationale is that optimizable computations often contain loops that perform extra iterations, and that removing iterations, then observing the resulting effect on the quality of service, is an effective way to identify such optimizable subcomputations. Our experimental results from applying our implemented quality of service profiler to a challenging set of benchmark applications show that it can enable developers to identify promising optimization opportunities and deliver successful optimizations that substantially increase the performance with only small quality of service losses
Application Heartbeats for Software Performance and Health
Adaptive, or self-aware, computing has been proposed as one method to help application programmers confront the growing complexity of multicore software development. However, existing approaches to adaptive systems are largely ad hoc and often do not manage to incorporate the true performance goals of the applications they are designed to support. This paper presents an enabling technology for adaptive computing systems: Application Heartbeats. The Application Heartbeats framework provides a simple, standard programming interface that applications can use to indicate their performance and system software (and hardware) can use to query an applicationâ s performance. Several experiments demonstrate the simplicity and efficacy of the Application Heartbeat approach. First the PARSEC benchmark suite is instrumented with Application Heartbeats to show the broad applicability of the interface. Then, an adaptive H.264 encoder is developed to show how applications might use Application Heartbeats internally. Next, an external resource scheduler is developed which assigns cores to an application based on its performance as specified with Application Heartbeats. Finally, the adaptive H.264 encoder is used to illustrate how Application Heartbeats can aid fault tolerance
- …